AWS DE Project
MAREDDY KRISHNA KISHORE REDDY
703284070
MAREDDY.KISHORE@ENQUERO.GENPACT.DIGITAL
Import data to S3
Create Glue DB
aws glue create-database --database-input
'{\"Name\":\"awsglueclidb\",\"Description\":\"This_Database_was_created_using_AWS_CLI_Powershell\"}'
Glue Create Crawler for csv
aws glue create-crawler --name aws-cli-glue-crawler-csv --role AWSGlueServiceRole-glueworkshop --database-name
awsglueclidb --table-prefix cli_ --targets '{\"S3Targets\": [{\"Path\": \"s3://awsgluecovid19data/covid-19-testing-
data/dataset/csv\"}]}'
Create Glue Crawler for JSON
aws glue create-crawler --name aws-cli-glue-crawler-json --role AWSGlueServiceRole-glueworkshop --database-name
awsglueclidb --table-prefix cli_ --targets '{\"S3Targets\": [{\"Path\": \"s3://awsgluecovid19data/covid-19-testing-
data/dataset/json\"}]}'
Run Crawlers
aws glue start-crawler --name aws-cli-glue-crawler-csv
Run Crawlers
aws glue start-crawler --name aws-cli-glue-crawler-json
Athena Sample Data
Task 2: Deploying Glue Locally
Steps Followed to deploy and run Glue Locally:
1. Get Docker Image of Glue
2. Run the docker to start an interactive Jupiter Notebook:
Command used:
docker run -it -v ~/.aws:/home/glue_user/.aws -v
$env:JUPYTER_WORKSPACE_LOCATION:/home/glue_user/workspace/jupyter
_workspace/ -e AWS_PROFILE=$env:AWS_PROFILE_NAME -e
AWS_ACCESS_KEY=$env:AWS_ACCESS_KEY -e AWS_REGION="us-east-1" -e
AWS_SECRET_ACCESS_KEY=$env:AWS_SECRET_ACCESS_KEY -e
DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p
8888:8888 --name glue_jupyter_lab amazon/aws-glue-
libs:glue_libs_4.0.0_image_01
/home/glue_user/jupyter/jupyter_start.sh
Running Glue Job
Steps:
1. Open Jupiter Lab and open PySpark Notebook
2. Run the Glue Job Script
Running Crawler on Parquet File
Create Crawler:
aws glue create-crawler --name aws-cli-glue-crawler-pqt --role AWSGlueServiceRole-
glueworkshop --database-name awsglueclidb --table-prefix cli_ --targets '{\"S3Targets\":
[{\"Path\": \"s3://gluelocaloutput/output\"}]}’
Run Crawler:
aws glue start-crawler --name aws-cli-glue-crawler-pqt
Verifying data in Glue Table with
Athena
GitHub Repository
https://github.com/kryptoblockskrishna/AWS_DE_Submission/tree/A
WS-DE-Project-Files
Thank You